ad dropfortoken leveltask
AD DROPforToken LevelTasks
Fortoken-leveltasks (e.g., NER and text generation), as we have several logit outputs to produce the corresponding attribution matrices for each attention map, applyingAD-DROPhas the challenge ofhowtofuse theseattributionmatrices. The results on the test sets are reported in Table 1 and Table 2. Moreover, to verify thatAD-DROPcan be adapted to other pre-trained models, for CoNLL-2003 NER, we chooseELECTRAasthebasemodel.ForWMT2016,OPUS-MTischosen. We discuss potential limitations ofAD-DROP as follows.